*Load complete dataset
use "/Users/schoon.1/Dropbox/Documents/- Research/- In progress/Terrorism in the Media/ALL THE TERROR DATA COMPLETE/1970-2019 Media and Orgs (final).dta", clear

*Drop variables that are in the old dataset, but need to be  re-constructed
drop news_total terror_total perc_terror perc_terror_lag1 us uk perc_ap perc_nytimes perc_times perc_host

*Construct analysis specific variables from new data
egen news_total = rowtotal(nytimes ap times)
*Row totals generate zero values even when all variables are missing, so let's make missing values missing:
replace news_total = . if nytimes==.&ap==.&times==.
label var news_total "Total news coverage, rowtotal(nytimes ap times)"

egen terror_total = rowtotal(nytimes_terror ap_terror times_terror)
*Row totals generate zero values even when all variables are missing, so let's make missing values missing:
replace terror_total = . if nytimes_terror==.&ap_terror==.&times_terror==.
label var news_total "Total news coverage, rowtotal(nytimes ap times)"

gen perc_terror = terror_total/news_total
label var perc_terror "Percent coverage referencing terrorism (terror_total/news_total)"

gen perc_nytimes = nytimes_terror/nytimes
label var perc_nytimes "Percent terrorism coverage, NYtimes(nytimes_terror/nytimes)"

gen perc_ap = ap_terror/ap
label var perc_ap "Percent terrorism coverage, AP (ap_terror/ap)"

gen perc_times = times_terror/times
label var perc_times "Percent terrorism coverage, times (times_terror/times)"

gen perc_host = ishostkid/nattacks
label var perc_host "Percent attacks that are hostage/kidnappings"


sort group_id
by group_id: gen perc_terror_lag = perc_terror[_n-1]
label var perc_terror_lag "Percent coverage referencing terrorism, one year lag"

by group_id: gen perc_nytimes_lag = perc_nytimes[_n-1]
label var perc_nytimes_lag "Percent NYTimes coverage referencing terrorism, one year lag"

by group_id: gen perc_ap_lag = perc_ap[_n-1]
label var perc_ap_lag "Percent AP coverage referencing terrorism, one year lag"

by group_id: gen perc_times_lag = perc_times[_n-1]
label var perc_times_lag "Percent Times coverage referencing terrorism, one year lag"


replace left = 0 if left==.
replace nat = 0 if nat==.
replace rel = 0 if rel==.
replace right=0 if right==.

gen us = 0
replace us = 1 if us_location >1
replace us = 1 if us_victim>1
label var us "US Target/Victim (binary)"

gen uk = 0
replace uk = 1 if uk_location >1
replace uk = 1 if uk_victim>1
label var uk "UK Targeted/Victim (binary)"

*Now I'm going to recode the GTD variables for use in statistical analysis
egen nvictims = rowtotal(nkill nwound nhostkid)
label var nvictims "Total victims (hostages, kidnapping, wounded, killed)"

egen znvictims = std(nvictims)
label var znvictims "Total victims (standardized))"

egen znattacks = std(nattacks)
label var znattacks "Total attacks (standardized)"

*Check for coding problems via descriptive statistics
sum news_total perc_terror perc_gov perc_suic perc_host ///
perc_ap perc_nytimes perc_times ///
perc_ap_lag perc_nytimes_lag perc_times_lag perc_terror_lag ///
us uk ///
nat rel right ///
nvictims

*Percent terrorism coverage variables came back with problematic observations. So, let's identify those observations:
list gname iyear news_total terror_total if perc_terror>1&perc_terror!=.

list gname iyear ap ap_terror if perc_ap>1&perc_ap!=.

list gname iyear nytimes nytimes_terror if perc_nytimes>1&perc_nytimes!=.

list gname iyear times times_terror if perc_nytimes>1&perc_nytimes!=.

*There are five observations that are wonky on perc_terror,five for ap, and two each for nytimes and times, indicating a coding problem (there are more terroris coverage instances than coverage total). Because LexisNexis has changed its search function, making it so we can't replicate the data collection procedure exactly, I'll drop these observations.
drop if perc_terror>1&perc_terror!=.
drop if perc_ap>1&perc_ap!=.
drop if perc_nytimes>1&perc_nytimes!=.
drop if perc_times>1&perc_times!=.

*Because the lag variables won't be caught by this, I need to drop those variables and re-construct them. Here I'll do that
drop perc_terror_lag perc_nytimes_lag perc_ap_lag perc_times_lag

sort group_id
by group_id: gen perc_terror_lag = perc_terror[_n-1]
label var perc_terror_lag "Percent coverage referencing terrorism, one year lag"

by group_id: gen perc_nytimes_lag = perc_nytimes[_n-1]
label var perc_nytimes_lag "Percent NYTimes coverage referencing terrorism, one year lag"

by group_id: gen perc_ap_lag = perc_ap[_n-1]
label var perc_ap_lag "Percent AP coverage referencing terrorism, one year lag"

by group_id: gen perc_times_lag = perc_times[_n-1]
label var perc_times_lag "Percent Times coverage referencing terrorism, one year lag"


*Re-checking descriptive statistics:
sum news_total perc_terror perc_gov perc_suic perc_host ///
perc_ap perc_nytimes perc_times ///
perc_ap_lag perc_nytimes_lag perc_times_lag perc_terror_lag ///
us uk ///
nat rel right ///
nvictims


* Everything looks good. Now, I need to generate the group means and group-mean-centered versions of the variables
egen terror_gpm=mean(perc_terror_lag), by(gname)
gen terror_gpmc=perc_terror_lag - terror_gpm
label var terror_gpmc "Percent terrorism coverage (group mean centered; 1-year lag)"

egen ap_terror_gpm=mean(perc_ap_lag), by(gname)
gen ap_terror_gpmc=perc_ap_lag - ap_terror_gpm
label var ap_terror_gpmc "Percent AP terrorism coverage (group mean centered; 1-year lag)"

egen nytimes_terror_gpm=mean(perc_nytimes_lag), by(gname)
gen nytimes_terror_gpmc=perc_nytimes_lag - nytimes_terror_gpm
label var nytimes_terror_gpmc "Percent NYTimes terrorism coverage (group mean centered; 1-year lag)"

egen times_terror_gpm=mean(perc_times_lag), by(gname)
gen times_terror_gpmc=perc_times_lag - times_terror_gpm
label var times_terror_gpmc "Percent Times terrorism coverage (group mean centered; 1-year lag)"

egen govtarg_gpm=mean(perc_gov), by(gname)
gen govtarg_gpmc=perc_gov - govtarg_gpm
label var govtarg_gpmc "Percent government targets (group mean centered)"

egen suicide_gpm=mean(perc_suicide), by(gname)
gen suicide_gpmc=perc_suicide - suicide_gpm
label var suicide_gpmc "Percent suicide (group mean centered)"

egen us_gpm=mean(us), by(gname)
gen us_gpmc=us-us_gpm
label var us_gpmc "US target (group mean centered)"

egen uk_gpm=mean(uk), by(gname)
gen uk_gpmc=uk-uk_gpm
label var uk_gpmc "UK target (group mean centered)"

egen nattacks_gpm=mean(znattacks), by(gname)
gen nattacks_gpmc= znattacks - nattacks_gpm
label var nattacks_gpmc "Number of Attacks (standardized; group mean centered)"

egen nvictims_gpm=mean(znvictims), by(gname)
gen nvictims_gpmc=znvictim - nvictims_gpm
label var nvictims_gpmc "Victims (nkill, nhost/kid, nwounded; standardized; group mean centered)"

egen hostage_gpm=mean(perc_host), by(gname)
gen hostage_gpmc=perc_host - hostage_gpm
label var hostage_gpmc "Percent Hostage/Kidnapping (group mean centered)"


save ATFTP.dta, replace
